Skip to content

[Draft] feat(btree): Intro b-tree global index and add tests for java compatibility.#212

Open
ChaomingZhangCN wants to merge 35 commits intoalibaba:mainfrom
ChaomingZhangCN:btree-global-index
Open

[Draft] feat(btree): Intro b-tree global index and add tests for java compatibility.#212
ChaomingZhangCN wants to merge 35 commits intoalibaba:mainfrom
ChaomingZhangCN:btree-global-index

Conversation

@ChaomingZhangCN
Copy link
Copy Markdown
Contributor

@ChaomingZhangCN ChaomingZhangCN commented Apr 8, 2026

Purpose

Linked issue: close #38

Tests

Note: Google Test filter = BTree*
[==========] Running 47 tests from 6 test suites.
[----------] Global test environment set-up.
[----------] 7 tests from BTreeIndexMetaTest
[----------] 7 tests from BTreeFileFooterTest
[----------] 16 tests from BTreeGlobalIndexerTest
[----------] 7 tests from BTreeGlobalIndexWriterTest
[----------] 5 tests from BTreeGlobalIndexIntegrationTest
[----------] 5 tests from BTreeCompatibilityTest
[----------] Global test environment tear-down
[==========] 47 tests from 6 test suites ran. (26 ms total)
[ PASSED ] 47 tests.

API and Format

Documentation

Generative AI tooling

Generated-by: Claude Code

ChaomingZhangCN and others added 13 commits March 3, 2026 10:05
# 请输入一个提交信息以解释此合并的必要性,尤其是将一个更新后的上游分支
# 合并到主题分支。
#
# 以 '#' 开始的行将被忽略,而空的提交说明将终止提交。
- Add BtreeGlobalIndexWriter for writing btree global index files
- Fix AllNonNullRows() compilation errors:
  - Use GetLongCardinality() instead of Cardinality()
  - Use AddRange(Range(0, total_rows - 1)) instead of AddRange(0, total_rows)
- Add unit tests for btree file footer, index meta, writer, and indexer
- Add integration test for btree global index
# Conflicts:
#	src/paimon/CMakeLists.txt
#	src/paimon/common/global_index/CMakeLists.txt
#	src/paimon/common/io/cache/cache_key.h
#	src/paimon/common/sst/block_cache.h
- Add B-tree compatibility test to ensure data compatibility with Java implementation
- Implement B-tree global index writer with proper file format
- Add integration tests for B-tree global index
- Refactor SST block footer to sort lookup store footer
- Update file index reader to support B-tree format
- Add comprehensive test data for compatibility verification

Co-Authored-By: Claude Opus <noreply@anthropic.com>
- Change BIGINT from string format to 8-byte little-endian binary format
- Change TINYINT from string format to 1-byte binary format
- Change SMALLINT from string format to 2-byte little-endian binary format
- Update compatibility test data to match new binary format
@ChaomingZhangCN ChaomingZhangCN force-pushed the btree-global-index branch 2 times, most recently from 8159962 to 319d889 Compare April 8, 2026 08:12
@lxy-9602
Copy link
Copy Markdown
Collaborator

lxy-9602 commented Apr 8, 2026

Impressive work on this complex feature. Thanks for the contribution — review coming up.

1. Fix modernize-use-auto warnings
   - Replace explicit type declarations with auto when initializing with template casts
   - Fixed 13 warnings in btree_global_indexer.cpp
   - Fixed 5 warnings in btree_global_indexer_test.cpp

2. Fix AddressSanitizer alloc-dealloc-mismatch errors
   - Replace Bytes::AllocateBytes with std::make_shared<Bytes>
   - Avoid memory pool allocated objects being freed by operator delete
   - Fixed 13 memory allocation/deallocation mismatches

3. Fix UndefinedBehaviorSanitizer null pointer error
   - Add num_bytes > 0 check in MemorySegmentUtils::CopyToBytes
   - Avoid passing null pointer to memcpy when num_bytes is 0

4. Fix modernize-use-default-member-init warning
   - Use default member initializer for file_counter_ in btree_global_index_writer_test.cpp
Comment thread src/paimon/common/sst/sst_file_reader.h Outdated
BlockFooter footer(index_block_handle, bloom_filter_handle);
auto slice = footer.WriteBlockFooter(pool_.get());
Status SstFileWriter::WriteSlice(const MemorySlice& slice) {
auto data = slice.ReadStringView();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between WriteSlice(slice) and Write(slice)? Seems only one func is enough.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no function named Write(slice).

Comment thread src/paimon/common/memory/memory_segment_utils.cpp Outdated
Comment thread src/paimon/common/lookup/sort/sort_lookup_store_footer.cpp
Comment thread src/paimon/common/sst/sst_file_io_test.cpp Outdated
Comment thread src/paimon/common/io/cache/cache.h
Comment thread src/paimon/common/io/cache/cache.h Outdated
Comment thread src/paimon/common/global_index/btree/btree_index_meta.h
Comment thread src/paimon/common/global_index/btree/btree_index_meta.cpp
Comment thread src/paimon/common/global_index/btree/btree_index_meta.cpp
Comment thread src/paimon/common/global_index/btree/btree_index_meta.cpp Outdated
ASSERT_NE(deserialized, nullptr);

// Verify keys are null
EXPECT_EQ(deserialized->FirstKey(), nullptr);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer ASSERT_* over EXPECT_* when the failure of a condition would make the rest of the test meaningless or lead to undefined behavior (e.g., null pointers, invalid setup).


TEST_F(BTreeIndexMetaTest, SerializeDeserializeWithOnlyFirstKey) {
// Create a BTreeIndexMeta with only first_key (edge case)
auto first_key = std::make_shared<Bytes>("first", pool_.get());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any scenario in which this case could occur?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also SerializeDeserializeWithOnlyLastKey().

Comment thread src/paimon/common/global_index/btree/btree_file_footer.h
class BTreeFileFooter {
public:
static Result<std::shared_ptr<BTreeFileFooter>> Read(MemorySliceInput& input);
static MemorySlice Write(const std::shared_ptr<BTreeFileFooter>& footer, MemoryPool* pool);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For out param, please use * rather than &.

Comment thread src/paimon/common/global_index/btree/btree_file_footer.cpp
Comment thread src/paimon/common/global_index/btree/btree_file_footer_test.cpp
Comment thread src/paimon/common/global_index/btree/btree_file_footer_test.cpp Outdated
Comment thread src/paimon/common/global_index/btree/btree_file_footer_test.cpp Outdated
Comment thread src/paimon/common/global_index/btree/btree_file_footer_test.cpp Outdated
Comment thread src/paimon/common/global_index/btree/btree_global_index_factory.cpp
Comment thread src/paimon/common/global_index/btree/btree_global_index_factory.cpp
Comment thread src/paimon/common/global_index/btree/btree_global_indexer.h Outdated
Comment thread src/paimon/common/global_index/btree/btree_global_index_writer.h Outdated
Comment thread src/paimon/common/global_index/btree/btree_global_index_writer.h
Comment thread src/paimon/common/global_index/btree/btree_global_index_writer.cpp Outdated
RoaringNavigableMap64 result;
result.AddRange(Range(0, total_rows - 1));
result.AndNot(*null_bitmap_);
return result;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If row ids are not in order (keys are sorted), we can't rely on short-circuiting logic as all range is unknown.

Comment thread src/paimon/common/global_index/btree/btree_global_indexer.cpp Outdated
Comment thread src/paimon/common/global_index/btree/btree_global_indexer.cpp Outdated
Comment thread src/paimon/common/global_index/btree/btree_global_indexer.cpp Outdated
Comment thread src/paimon/common/global_index/btree/btree_global_indexer.cpp Outdated
Comment thread src/paimon/common/global_index/CMakeLists.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support btree global index.

3 participants